AITopics | full batch

Collaborating Authors

full batch

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DAGER: Exact Gradient Inversion for Large Language Models

Neural Information Processing SystemsDec-26-2025, 18:34:37 GMT

Federated learning works by aggregating locally computed gradients from multiple clients, thus enabling collaborative training without sharing private client data. However, prior work has shown that the data can actually be recovered by the server using so-called gradient inversion attacks. While these attacks perform well when applied on images, they are limited in the text domain and only permit approximate reconstruction of small batches and short input sequences. In this work, we propose DAGER, the first algorithm to recover whole batches of input text exactly. DAGER leverages the low-rank structure of self-attention layer gradients and the discrete nature of token embeddings to efficiently check if a given token sequence is part of the client data. We use this check to exactly recover full batches in the honest-but-curious setting without any prior on the data for both encoder and decoder-based architectures using exhaustive heuristic search and a greedy approach, respectively. We provide an efficient GPU implementation of DAGER and show experimentally that it recovers full batches of size up to 128 on large language models (LLMs), beating prior attacks in speed (20x at same batch size), scalability (10x larger batches), and reconstruction quality (ROUGE-1/2 > 0.99).

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.63)

Add feedback

Never Go Full Batch (in Stochastic Convex Optimization)

Neural Information Processing SystemsDec-24-2025, 22:58:20 GMT

We study the generalization performance of $\text{\emph{full-batch}}$ optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated variants. We provide a new separation result showing that, while algorithms such as stochastic gradient descent can generalize and optimize the population risk to within $\epsilon$ after $O(1/\epsilon^2)$ iterations, full-batch methods either need at least $\Omega(1/\epsilon^4)$ iterations or exhibit a dimension-dependent sample complexity.

full batch, name change, stochastic convex optimization, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Add feedback

main remarks regarding baseline, scalability, complexity and the full batch setting in the following paragraphs

Neural Information Processing SystemsAug-16-2025, 23:20:30 GMT

We thank the reviewers for the valuable comments and suggestions made. The reviewers' main concern is the lack of RQVI procedure led to computational instability). GLM, BNN) and five datasets (Boston, Fires, Life Expect., Frisk and Metro) with learning rate analysis. We do not claim that this method is suitable for high dimensional posteriors. It is accurate that the method will not be viable without this property.

baseline, dataset, scalability, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

DAGER: Exact Gradient Inversion for Large Language Models

Neural Information Processing SystemsMay-27-2025, 10:51:59 GMT

dager, exact gradient inversion, language model, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.58)

Add feedback

Never Go Full Batch (in Stochastic Convex Optimization)

Neural Information Processing SystemsJan-19-2025, 07:16:07 GMT

We study the generalization performance of \text{\emph{full-batch}} optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated variants. We provide a new separation result showing that, while algorithms such as stochastic gradient descent can generalize and optimize the population risk to within \epsilon after O(1/\epsilon 2) iterations, full-batch methods either need at least \Omega(1/\epsilon 4) iterations or exhibit a dimension-dependent sample complexity.

full batch, gradient descent, stochastic convex optimization

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.97)

Add feedback

Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be

Kunstner, Frederik, Chen, Jacques, Lavington, Jonathan Wilder, Schmidt, Mark

arXiv.org Artificial IntelligenceApr-27-2023

The success of the Adam optimizer on a wide array of architectures has made it the default in settings where stochastic gradient descent (SGD) performs poorly. However, our theoretical understanding of this discrepancy is lagging, preventing the development of significant improvements on either algorithm. Recent work advances the hypothesis that Adam and other heuristics like gradient clipping outperform SGD on language tasks because the distribution of the error induced by sampling has heavy tails. This suggests that Adam outperform SGD because it uses a more robust gradient estimate. We evaluate this hypothesis by varying the batch size, up to the entire dataset, to control for stochasticity. We present evidence that stochasticity and heavy-tailed noise are not major factors in the performance gap between SGD and Adam. Rather, Adam performs better as the batch size increases, while SGD is less effective at taking advantage of the reduction in noise. This raises the question as to why Adam outperforms SGD in the full-batch setting. Through further investigation of simpler variants of SGD, we find that the behavior of Adam with large batches is similar to sign descent with momentum.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2304.1396

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > British Columbia (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Visual interpretation of the robustness of Non-Negative Associative Gradient Projection Points over function minimizers in mini-batch sampled loss functions

Kafka, Dominic, Wilke, Daniel

arXiv.org Machine LearningMar-20-2019

Mini-batch sub-sampling is likely here to stay, due to growing data demands, memory-limited computational resources such as graphical processing units (GPUs), and the dynamics of on-line learning. Sampling a new mini-batch at every loss evaluation brings a number of benefits, but also one significant drawback: The loss function becomes discontinuous. These discontinuities are generally not problematic when using fixed learning rates or learning rate schedules typical of subgradient methods. However, they hinder attempts to directly minimize the loss function by solving for critical points, since function minimizers find spurious minima induced by discontinuities, while critical points may not even exist. Therefore, finding function minimizers and critical points in stochastic optimization is ineffective. As a result, attention has been given to reducing the effect of these discontinuities by means such as gradient averaging or adaptive and dynamic sampling. This paper offers an alternative paradigm: Recasting the optimization problem to rather find Non-Negative Associated Gradient Projection Points (NN-GPPs). In this paper, we demonstrate the NN-GPP interpretation of gradient information is more robust than critical points or minimizers, being less susceptible to sub-sampling induced variance and eliminating spurious function minimizers. We conduct a visual investigation, where we compare function value and gradient information for a variety of popular activation functions as applied to a simple neural network training problem. Based on the improved description offered by NN-GPPs over minimizers to identify true optima, in particular when using smooth activation functions with high curvature characteristics, we postulate that locating NN-GPPs can contribute significantly to automating neural network training.

artificial intelligence, machine learning, nn-gpp, (15 more...)

arXiv.org Machine Learning

1903.08552

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Online (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback